Entropic Barriers, Frustration and Order: Basic Ingredients in Protein Folding 
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We solve a model that takes into account entropic barriers, frustration, and the organization of 
a protein-like molecule. For a chain of size M, there is an effective folding transition to an ordered 
structure. Without frustration, this state is reached in a time that scales as , with A ~ 3. This 
scaling is limited by the amount of frustration which leads to the dynamical selectivity of proteins: 
foldable proteins are limited to ~ 300 monomers; and they are stable in one range of temperatures, 
independent of size and structure. These predictions explain generic properties of tn vivo proteins. 



in 

T— I 
o 

Q 



> 

O 
in 



X3 

o 



X 
S3 



PACS numbers: 87.10. +e, 82.20.Db, 05.70.Fh, 64.60. Cn 



Proteins fold to a well defined three dimensional struc- 
ture, usually referred to as the native state. However, 
two basic ingredients of these biomolecules oppose this 
ordering process. Namely, the large entropy associated 
with the many possible conformations, and the energetic 
frustration present in proteins. From a physical point of 
view, the interplay between order, entropy and frustra- 
tion poses some fundamental questions as to what the 
mechanism of the folding process is: "Is it possible to 
rationalize folding as a relaxation process to thermody- 
namic equilibrium (as some experiments suggest 
Or "is some cell engineered — e.g. chaperon mediated 

— mechanism needed to understand the folding pro- 
cess? In this paper, we address these questions by solving 
a protein-like model with all three aforementioned prop- 
erties. We find that, indeed, under some well defined 
conditions kinetically foldable proteins can exist. More- 
over, the mechanism reconcile very restrictive properties 
of in vivo globular proteins: their typical size is restricted 
to about 300 residues (or monomers) and they are stable 
in a unique range of temperatures! 

One obstacle to folding, referred to as Levinthal's 
"paradox" j|] , has been discussed in several articles deal- 
ing with protein folding dynamics |^-|^ . It relates to the 
fact that the time needed to find the native state by sam- 
pling at random the protein phase space is of the order 
of the age of the universe. Nonetheless, proteins fold in 
10"'^ to 1 second. Partial attempts to resolve this para- 
dox have been made by either suggesting some simple 
dynamics or some favorable folding conditions 
Based on scaling arguments, it has been found |^ that if 
the ratio of hydrophobic to hydrophilic residues in pro- 
teins is around one, then there is a natural hierarchy in 
the organization of the conformational space of proteins. 
This hierarchy suggests a three stage folding kinetics to 
solve the paradox. This kinetics, sketched in Fig. 1 [9a], 
has been observed in computer simulations [9b], and also 
seems to have been observed in experiments |lO| . 

A second obstacle to folding is the ruggedness of the en- 
ergy landscape, or frustration. Frustration is particularly 
important when sampling globular conformations, as in 
regimes II and III. Analytical approaches to understand 



the role of frustration have lead to the conclusion that 
random sequences of aminoacids should resemble spin- 
glasses [|ll|, and therefore show glassy dynamics. Hence, 
it has been argued that kinetically foldable sequences (i.e. 
those that fold fast, say, in a biological time scale) must 
somehow have minimum frustration [11a]. 

By the end of regime II, most of the native-like struc- 
ture has already been acquired. As indicated in Fig. 1, 
this regime entails an entropy crisis |^ in a rugged en- 
ergy landscape. To unveil this process, we choose not 
to describe regime HI, avoiding a detailed description of 
the native state. Thus, we model the folding process 
to a generic native-like structure. Given a well defined 
native structure, the microscopic model consists on (a) 
the definition of what a native-like state is, (b) the char- 
acterization of the space of conformations, and (c) the 
dynamics. In what follows, the Boltzmann constant has 
been set to one, and we work in adimensional units. 

The energetics involve only short range pairwise inter- 
actions (paired monomers do not interact). A confor- 
mation with M — 2N monomers has at most N non- 
overlapping bonds, (a) There are N native contacts de- 
fined as the set of N distinct pairs of residues which are 
closest in space in the native conformation. All other 
possible contacts are defined as non- native bonds. Those 
structures with all N native contacts formed are called 
native-like states. This definition does not uniquely de- 
termine a three dimensional structure, nevertheless this 
generic native-like state is expected to resemble the over- 
all native structure, (b) We consider two energy scales: 
—En < for native bonds, and Enn for non- native ones. 
As shown in Fig. 2A, conformations are classified ac- 
cording to their number of native (i) and non-native (j) 
bonds. Their energy is given hy E = —i SN—j £nn, with 
£n > enn > 0. The energy constant Enn yields an at- 
tractive force between non-native bonds, giving rise to a 
frustrated energy landscape. The ratio A = Enn/en is 
defined as the frustration parameter (see Fig . 2A). 

To obtain the spectrum, we compute recursively the 
exact crosslinking coefficient C'(i,j), i.e. the number of 
different combinations of i native and j non-native bonds 
among M distinguishable monomers, 
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C(,)=E^(^^0^»...= 2^^P^32^, (1) 

where q — i + j is the total number of bonds formed. 
There is also a remanent entropy S{q) associated with 
the number of conformations that share the same set 
of paired monomers. For a conformation with, say, 
one bond S{q = 1) = S{1 + li + I2), where I, li and 
I2 ~ M — I — li — 2q are the length of the loop, and 
of the two free ends of the polypeptide chain. The en- 
tropy of loops and free ends is approximated by that 
of a free chain of the same size So{l), i.e. S{q) w 
Soil) + Soih) + Soih) = So{M - 2q), where neglecting 
logarithmic corrections 5*0(0 = Hnw and w is a constant 
(see, e.g., Ref. Q). This assumes that all bonded pair of 
monomers is equally likely, regardless to their separation 
along the chain. Hence, the whole conformational space 
of our model Ct can be enumerated as 

N q 

CT = Y.il ^i'^j) exp[So{M ~ 2q)] . (2) 

9=0 i+j=q 

Using (2), we obtain the exact partition function, and 
therefore all thermodynamic quantities. Fig. 2A shows a 
scheme of the spectrum for A = 1/3, and sketches (c) the 
dynamics chosen to mimic the folding process (explained 
in detail in Fig. 2B). 

Entropy plays a leading role on selecting folding path- 
ways. In particular, noting that the likelihood of forming 
non-native bonds is much higher than the one of forming 
native ones, there is a natural tendency to prefer non- 
native states. In this sense, even with no energy barriers 
(A = 0), i.e. every single state is connected to the ground 
state by an energy decreasing pathway, non-native states 
form entropic barriers to folding |^2| ! 

The model has a zero temperature transition. As 
shown in Fig. 3A, the specific heat diverges (logarith- 
mically) as M 00. However, proteins are finite, and 
for a finite size chain, there is an effective folding transi- 
tion at T = Tf{M) (defined by the peak in the specific 
heat). The inset in Fig. 3 A shows that for T > T/ 
most bonds are non-native, whereas for T < Tf the 
protein orders into a native- like state The best 

fit for Tf{M, En, A, w) and M = 40 - 1600, En = 3, 
£nn =0 — 3^, and w = 1 — 5 yields 

Tf{M, En, A) = 1.01eAr(l - A)/ \ii{M) + a/M, (3) 

where a (i; 2) depends on w and A, and is used as a fit- 
ting parameter for the leading correction-to-scaling term. 
For the aforementioned range of parameters, (3) deviates 
from the exact values of Tf (measured with four signif- 
icant figures) by less than 1%! Note that the leading 
term in (3) is independent of w. This can be understood 
because the folding pathways are mostly determined by 
the crosslinking coefficient C{i, j), whereas w acts as a 
non-specific entropic weight. 



What happens with the dynamics? The time scale to 
reach equilibrium r is measured by fitting the exponen- 
tial decay (exp(— </r)) of the long-time deviation from 
equilibrium of any correlation function. The time t is 
measured in updates of the master equation defined by 
the transition probabilities in Fig. 2B. Time scales are 
independent of the initial condition. To unveil the role 
of the entropic barriers we first calculate r with no frus- 
tration (A = 0). As shown in the inset of Fig. 3B, the 
equilibrium relaxation time near Tf diverges as M — > 00. 
The divergence of the peak in r scales as Tc ~ M^, with 
z — 3.8 ± 0.25 (not shown). An striking observation is 
that the relaxation time to the native- like state [T <Tf) 
scales as tq ~ 0.45Af^ with A = 3.02 ±0.02, independent 
of w and temperature! 

On adding frustration, a frustration limited folding 
time scale ta enters into the problem. Hence, as shown 
in Fig. 3B, below some temperature Ta the ordered state 
is achieved in a time scale that diverges as T ^ as 

TA ~ 0.5A/^exp(2eArA/T)/w'*-° , (4) 

with A = 3.14 ± 0.07. This expression combines both 
entropic barriers and the largest minimal energy barrier 
2ejvA on the landscape (see Fig. 2A). The dependence 
on w is mostly due to the normalization factor of the tran- 
sition probabilities — likely an artifact of the dynamics. 

For A = 1/3, the peak in r diverges with an exponent 
z = 4.3 ± 0.2 (not shown). More importantly, we find a 
small range of temperatures around Ta — 0.3 where — 
independently of size — the native-like state is reached in 
a time scale ^ tq. Away from this temperature, proteins 
will either fold too slow (T < Ta) or they will not fold at 
all [T > Tf). As far as we know, this is the first time a 
model has predicted the dynamical selection of a whole 
class of proteins in a well defined range of temperatures! 
This behavior is robust, however, the size of the stability 
region and Ta have a smooth dependence on A and w. 
By further increasing A, ta takes over the relaxation dy- 
namics, even at T > Tf. Fig. 3B shows that already for 
A = 2/3 there is no dynamical evidence for the order- 
ing transition at T/(M). Substituting (3) T = T/ in (4) 
yields the frustration limited time scale at the transition 

~ 0.5AfVw*-° with C== A + 2A/(1-A). (5) 

Thus, a native-like conformation is reached in a folding 
time scale t/ « max{rA, tq}. Fast folding sequences will 
fold in Tf « Tq! Eq. 5 embodies the expected divergence 
of the folding time in the "glassy" limit A — ^ 1. Indeed, 
as shown in Fig. 4, a small change in A can lead to an 
enormous increase in Tf. 

Our results are in excellent agreement with simulations 
of random sequences of protein-like chains, where it has 
been shown that few sequences can fold, while most se- 
quences do not — not even to native-like intermediates 
0,0. Fast folding in a limited range of temperatures 
has also been observed in Monte Carlo simulations of lat- 
tice models of proteins | p^ , p^ , and has been suggested 
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by Wolynes and collaborators [8,11a] in the context of a 
glass transition. 

It can be argued that one update of the master equa- 
tion should roughly correspond to to = 10^'' sec, i.e. 
time scale to diffuse over some few residues length. This 
leads to the conclusion that, even when there is no frus- 
tration, folding in a biological time scale of seconds is 
restricted to globular proteins with 300 or less residues 
(see Fig. 4). Smaller proteins with M « 40 may fold as 
fast as lO^'^ sec. With frustration, proteins fold fast if 
M<300 for A = 1/3, and M<20 for A = 1/2. These 
limited sizes and time scales are reasonable when com- 
pared to those found in nature. Hence, we conclude that 
the typical ratio of free energies of a random (or non- 
native) bond to that of a native bond must be close to 
1/3. We note that A = 1/3 is compatible with estimates 
for the relative free energies of non-bonded interactions 
in proteins [ p^ . 

The overall relaxation must also involve regime III. 
The average time to escape from one low-energy native- 
like state to another will depend on the energy bar- 
rier separating them. The structural rearrangement of 
a native-like state may involve freeing a loop or, at most, 
a surface from its neighbors. Accordingly, the number of 
broken bonds should scale as A^^/^ for a loop, or iV^/^ for 
a surface. An estimate for the escape time can then be 
to exp[eAr7v(aAf)^/^/r] sec smaller than tq for num- 
bers like a = 1/3 and T = 0.4. This analysis points out 
to the conclusion that acquiring the native-like structural 
features is the rate limiting step of the folding process. 

We have solved a non-sequential model of protein fold- 
ing that focus on the rearrangement of random confor- 
mations to close-to-native structures. The most strik- 
ing predictions are that, regardless of the details of the 
model and native structure, folding is limited to a well de- 
fined range of temperatures — Ta ^ T > T/ and e/fc^T ~ 
3-10 — and to globular proteins with M « 300 or less 
residues. Away from these limits, proteins do not fold. 
The model predicts a small (logarithmic) excess of heat 
at the ordering transition Tf . We expect Tf and Tg to be 
rather close for kinetically foldable proteins [9b]. Hence, 
experimentally, it may be difficult to resolve the excess 
of heat from these two transitions. Entropic barriers de- 
termine the time needed to find a native-like state, which 
scales as M^, where A ~ 3. Based on polymer dynamics 
insights, a similar exponent has already been predicted 
for the second stage of the folding kinetics [^,^,0. The 
dynamics is governed by a multiplicity of folding path- 
ways with non-native-like transients. The limitation to 
the aforementioned scaling time is the amount of frus- 
tration A, defined as the relative (attractive) strength of 
non-native and native bonds. If the amount of frustra- 
tion is too large (A > 1/3), the time scale to reach the 
native-like state scales as M'', where C = A-l-2 A/(l — A). 
In this case, folding in a biological time scale is slow and 
restricted to very small protein sizes. We conclude that 
the acquisition of native-like structure is the rate limit- 
ing step of folding. Folding can be rationalized based 



on thermodynamic stability, at least that of native-like 
states. Whether overcoming the rather large energy bar- 
riers needed to differentiate these states requires the me- 
diation of, say, chaperones remains uncertain. 
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FIG. 1. Scheme of the three stage folding kinetics. I.- 
Starting from a fully unfolded conformation R, the initial 
regime corresponds to a fast down-hill energy minimiza- 
tion process, where the ruggedness of the energy surface 
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plays almost no role. II.- Here, the mostly non-native 
contacts — i.e. bonds not present in the native state N — 
rearrange into native ones, reaching fairly stable and 
compact native-like structures. III.- The third regime 
corresponds to the search of the native state among a 
small set of close-to-native conformations separated by 
rather large energy barriers. These barriers are mostly 
due to the cooperativity required by excluded volume 
interactions to change structural features buried in the 
protein core. 



FIG. 2. (A) Spectrum. Each level [i, j] includes all 

possible distinct conformations with i native and j non- 
native number of bonds. Some typical numbers for 
M = 100 and w = 5 are shown in parenthesis. The 
slope A measures the amount of energetic frustration in 
the model. Frustration is maximum when A = 1. As 
indicated by the solid and dotted lines, the spectrum al- 
ready suggests a multiplicity of pathways for connecting 
the states. Note, e.g., the minimum energy barrier path- 
way connecting states [0, N] and [TV, 0], which crosses 
over N — 1 small barriers of size Enn and one of size 
2ejvjv- (B) Transition probabilities for st&tes [i, j]. Prom 
each state [i, j] one can cither form a new bond (non- 
local event) or break one (local event). As expected in 
a real physical situation, the entire temperature depen- 
dence is on the backward transition probabilities, and 
comes from the energy penalty of breaking a bond. The 
forward probabilities, however, depend on the likelihood 
of forming a new bond. Given that there are q bonds 
formed, the free sites can form (*^2^^) new bonds. Pn 
(Pnn = 1 ^ Pn) is the probability of forming a native 
bond. Detailed balanced requires an extra factor k{q) 
proportional to the ratio of the number of conformations 
with q + 1 bonds and that with q bonds. The proba- 
bilities are uniformly normalized such that the largest 
transition probability of the master equation is one half. 



FIG. 3. (A) Thermodynamics for = 3, A = 1/3, 
and w = 5. Specific heat (fluctuations of the reduced en- 
ergy E = E/T) as a function of temperature, for M = 24, 
50, 100, 200, 400, 800, and 1600. The dashed line shows 
the power-law divergence of the specific heat peak as 
Tf{M) — »■ 0. The inset shows the average number of na- 
tive and non-native bonds as a function of temperature. 
Axes are in adimensional units, and data is exact. Dot- 
ted line indicates Tj for M = 50. (B) Dynamics. Scaled 
relaxation time for £jv = 3 and w = 5 as a function of 
temperature. Inset shows the case A = 0: For T<0.4, 
the data for all values of M collapse to a constant (see 
To in text). Main figure shows the cases A = 1/3 and 
A = 2/3. Solid lines correspond to Eq. 4. The horizontal 
axes for A = 2/3 has been rescaled by a factor 1.15. The 
symbols denote M = 100 x, 80 +, 50 A, 30 □, 20 o, and 
10 *. Error bars on r are less than 0.4%. Dotted lines 
are a guide to the eye. 



FIG. 4. Scaling of the relaxation time to a native- 
like state as a function of size and frustration. Dotted 
line corresponds to the case with no frustration. Dashed 
line indicates a correspondence between time measured 
in updates and seconds. 
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